---
title: "AstraEmbeddingRetriever"
id: astraretriever
slug: "/astraretriever"
description: "This is an embedding-based Retriever compatible with the Astra Document Store."
---

# AstraEmbeddingRetriever

This is an embedding-based Retriever compatible with the Astra Document Store.
                                                                                                                                                                                                                                     |
<div className="key-value-table">

|                                        |                                                                                                                                                                                                                                                                           |
| :------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Most common position in a pipeline** | 1. After a Text Embedder and before a [`PromptBuilder`](../builders/promptbuilder.mdx)  in a RAG pipeline  2. The last component in the semantic search pipeline  3. After a Text Embedder and before an [`ExtractiveReader`](../readers/extractivereader.mdx)  in an extractive QA pipeline |
| **Mandatory init variables**           | `document_store`: An instance of [AstraDocumentStore](../../document-stores/astradocumentstore.mdx)                                                                                                                                                                                            |
| **Mandatory run variables**            | `query_embedding`: A list of floats                                                                                                                                                                                                                                     |
| **Output variables**                   | `documents`: A list of documents                                                                                                                                                                                                                                           |
| **API reference**                      | [Astra](/reference/integrations-astra)                                                                                                                                                                                                                                           |
| **GitHub link**                        | https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/astra                                                                                                                                                                                   |

</div>

## Overview

`AstraEmbeddingRetriever` compares the query and document embeddings and fetches the documents most relevant to the query from the [`AstraDocumentStore`](../../document-stores/astradocumentstore.mdx) based on the outcome.

When using the `AstraEmbeddingRetriever` in your NLP system, make sure it has the query and document embeddings available. You can do so by adding a Document Embedder to your indexing pipeline and a Text Embedder to your query pipeline.

In addition to the `query_embedding`, the `AstraEmbeddingRetriever` accepts other optional parameters, including `top_k` (the maximum number of documents to retrieve) and `filters` to narrow down the search space.

### Setup and installation

Once you have an AstraDB account and have created a database, install the `astra-haystack` integration:

```shell
pip install astra-haystack
```

From the configuration in AstraDB’s web UI, you need the database ID and a generated token.

You will additionally need a collection name and a namespace. When you create the collection name, you also need to set the embedding dimensions and the similarity metric. The namespace organizes data in a database and is called a keyspace in Apache Cassandra.

Then, optionally, install sentence-transformers as well to run the example below:

```shell
pip install sentence-transformers
```

## Usage

We strongly encourage passing authentication data through environment variables: make sure to populate the environment variables `ASTRA_DB_API_ENDPOINT` and  `ASTRA_DB_APPLICATION_TOKEN` before running the following example.

### In a pipeline

Use this Retriever in a query pipeline like this:

```python
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack_integrations.components.retrievers.astra import AstraEmbeddingRetriever
from haystack_integrations.document_stores.astra import AstraDocumentStore

document_store = AstraDocumentStore()

model = "sentence-transformers/all-mpnet-base-v2"

documents = [Document(content="There are over 7,000 languages spoken around the world today."),
						Document(content="Elephants have been observed to behave in a way that indicates a high level of self-awareness, such as recognizing themselves in mirrors."),
						Document(content="In certain parts of the world, like the Maldives, Puerto Rico, and San Diego, you can witness the phenomenon of bioluminescent waves.")]

document_embedder = SentenceTransformersDocumentEmbedder(model=model)
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)

document_store.write_documents(documents_with_embeddings.get("documents"), policy=DuplicatePolicy.SKIP)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder(model=model))
query_pipeline.add_component("retriever", AstraEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "How many languages are there?"

result = query_pipeline.run({"text_embedder": {"text": query}})

print(result['retriever']['documents'][0])
```

The example output would be:

```python
Document(id=cfe93bc1c274908801e6670440bf2bbba54fad792770d57421f85ffa2a4fcc94, content: 'There are over 7,000 languages spoken around the world today.', score: 0.8929937, embedding: vector of size 768)
```

## Additional References

🧑‍🍳 Cookbook: [Using AstraDB as a data store in your Haystack pipelines](https://haystack.deepset.ai/cookbook/astradb_haystack_integration)
